## IN THE CLAIMS

Please amend the claims as follows:

- 1. (Currently Amended) An apparatus comprising:
  - a memory interface;
- a first plurality of queues connected to the memory interface, including a first queue and a second queue, each of the first plurality of queues for holding a plurality of pending memory requests;

one or more instruction-processing circuits, wherein each instruction-processing circuit is operatively coupled [[to]] through the plurality of queues to the memory interface and wherein each of the plurality of instruction-processing circuits that inserts one or more memory requests into at least one of the queues based on a first memory operation instruction, that specifies a memory operation, and that inserts a first synchronization marker into the first queue and inserts a second synchronization marker into the second queue based on a second synchronization operation instruction, that specifies a synchronization operation, and that and inserts one or more memory requests into at least one of the queues based on a third second memory operation instruction; that specifies a memory operation[[;]] and

a first synchronization circuit, operatively coupled to the first plurality of queues, that selectively halts processing of further memory requests from the first queue based on the first synchronization marker reaching a predetermined point in the first queue until the corresponding second synchronization marker reaches a predetermined point in the second queue[[.]];

wherein each of the memory requests is a memory reference, wherein the memory reference is generated as a result of execution of instructions by the instruction-processing circuits.

(Currently Amended) The apparatus of claim 1, wherein the plurality of queues is within 2. a processor, and wherein the first queue is used for only synchronization markers and vector memory references, requests and synchronizations[[,]] and the second queue is used for only synchronization markers and scalar memory references. requests and synchronizations[[.]]

- 3. (Currently Amended) The apparatus of claim 2, wherein the first synchronization operation instruction is an Lsync-type instruction.
- 4. (Currently Amended) The apparatus of claim 2, wherein the first synchronization operation instruction is an Lsync V,S-type instruction.
- 5. (Currently Amended) The apparatus of claim 2, wherein, for a second synchronization operation instruction, a corresponding synchronization marker is inserted to in only the first queue.
- 6. (Original) The apparatus of claim 5, wherein the second synchronization instruction is an Lsync-type instruction.

# 7-8. (Canceled)

- 9. (Currently Amended) The apparatus of claim [[1]] 2, further comprising[[:]] wherein the first queue includes two subqueues, including a first subqueue and a second subqueue, wherein the first subqueue is for holding the vector memory references and synchronization markers associated with the vector memory references and wherein the second subqueue is a fifth queue for holding a plurality of write store data elements and synchronization markers associated with the store data elements, wherein each write store data element in the second subqueue corresponds to a one of the memory request requests in the first queue subqueue, and wherein the write store data elements are loaded into the fifth queue second subqueue decoupled from the loading of the memory requests into the first queue subqueue.
- 10. (Currently Amended) The apparatus of claim [[1]] 4, further comprising[[:]] wherein the instruction-processing circuits include a data cache[[;]] and an external cache[[,]] and wherein first synchronization instruction is an the Lsync V,S type instruction[[,]] prevents preventing subsequent scalar references from accessing the data cache until all vector references have been

sent to the an external cache and all vector writes have caused any necessary invalidations of the data cache.

#### 11. (Currently Amended) A method comprising:

providing a memory interface;

providing a first plurality of queues connected to the memory interface, including a first queue and a second queue, each of the first plurality of queues for holding a plurality of pending memory requests;

providing one or more instruction-processing circuits, wherein each instructionprocessing circuit is operatively coupled through the plurality of queues to the memory interface;

inserting one or more memory requests into at least one of the queues based on a first memory operation instruction that specifies a memory operation executed in one of the instruction-processing circuits;

inserting based on a second instruction that specifies a synchronization operation inserting a first synchronization marker into the first queue and inserting a second synchronization marker into the second queue <u>based on a synchronization operation ins</u>truction executed in one of the instruction-processing circuits; and

inserting one or more memory requests into at least one of the queues based on a third second memory operation instruction that specifies a memory operation executed in one of the instruction-processing circuits;

processing memory requests from the first queue; and

selectively halting further processing of memory requests from the first queue based on the first synchronization marker reaching a predetermined point in the first queue until the corresponding second synchronization marker reaches a predetermined point in the second queue[[.]];

wherein each of the memory requests is a memory reference.

(Currently Amended) The method of claim 11, wherein the plurality of queues is within 12. a first processor, and wherein the inserting to the first queue is for stores only synchronization marks and vector memory references, requests and synchronizations[[,]] and the inserting to

wherein the second queue is for stores only synchronization marks and scalar memory

references. requests and synchronizations[[.]]

13. (Currently Amended) The method of claim 12, wherein the first synchronization

operation instruction is an Lsync-type instruction.

(Currently Amended) The method of claim 12, wherein the first synchronization 14.

operation instruction is an Lsync V,S-type instruction.

15-18. (Canceled)

19. (Currently Amended) The method of claim [[11]] 12, further comprising[[:]] wherein the

first queue includes two subqueues, including a first subqueue and a second subqueue, wherein

the first subqueue stores the vector memory references and synchronization markers associated

with the vector memory references and wherein the second subqueue stores queueing a plurality

of write store data elements to a fifth queue and synchronization markers associated with the

store data elements, wherein each write store data element in the second subqueue corresponds to

[[a]] one of the memory request requests in the first queue subqueue, and wherein the write store

data elements are inserted into the fifth queue second subqueue decoupled from the inserting of

the memory requests into the first queue subqueue.

20. (Currently Amended) The method of claim [[11]] 14, further comprising[[:]] wherein

providing the instruction-processing circuits includes providing a data cache and an external

eache[[,]] wherein first-synchronization instruction is an performing the Lsync V,S type

instruction[[; and]] includes preventing subsequent scalar references from accessing the data

cache until all vector references have been sent to [[the]] an external cache and all vector writes

have caused any necessary invalidations of the data cache based on the first-synchronization

instruction.

# 21. (Currently Amended) An apparatus comprising:

# a memory interface;

a first plurality of queues <u>connected to the memory interface</u>, including a first queue and a second queue, each of the first plurality of queues for holding a plurality of pending memory requests;

one or more instruction-processing circuits, wherein each instruction-processing circuit is operatively coupled through the plurality of queues to the memory interface and wherein each of the plurality of instruction-processing circuits includes:

means for inserting one or <u>more</u> memory requests into at least one of the queues based on a first <u>memory operation</u> instruction that specifies a memory operation <u>executed</u> in one of the instruction-processing <u>circuits</u>;

means for[[,]] based on a second instruction that specifies a synchronization operation inserting a first synchronization marker into the first queue and inserting a second synchronization marker into the second queue based on a synchronization operation instruction executed in one of the instruction-processing circuits; and

means for inserting one or <u>more</u> memory requests into at least one of the queues based on a <u>third second memory operation</u> instruction <u>that specifies a memory operation</u> <u>executed in one of the instruction-processing circuits</u>;

means for processing memory requests from the first queue; and

means for selectively halting further processing of memory requests from the first queue based on the first synchronization marker reaching a predetermined point in the first queue until the corresponding second synchronization marker reaches a predetermined point in the second queue[[.]];

wherein each of the memory requests is a memory reference.

22. (Currently Amended) The apparatus of claim 21, wherein the plurality of queues is within a first processor, and wherein the means for inserting to the first queue operates for only vector memory requests and synchronizations, and [[the]] means for inserting to the second queue operates for only scalar memory requests and synchronizations.

Dkt: 1376.733US1

23. (Currently Amended) The apparatus of claim 22, wherein the [[first]] synchronization operation instruction is an Lsync-type instruction.

#### 24. (New) A system comprising:

a plurality of processors, including a first processor and a second processor, wherein each of the processors includes:

a memory interface;

a plurality of Lsync queues connected to the memory interface, including a first Lsync queue and a second Lsync queue, each of the plurality of Lsync queues for holding a plurality of pending memory requests;

one or more instruction-processing circuits, wherein each instruction-processing circuit is operatively coupled through the plurality of Lsync queues to the memory interface and wherein each of the plurality of instruction-processing circuits inserts one or more memory requests into at least one of the Lsync queues based on a first memory operation instruction, inserts a first Lsync synchronization marker into the first Lsync queue and inserts a second Lsync synchronization marker into the second Lsync queue based on a synchronization operation instruction, and inserts one or more memory requests into at least one of the Lsync queues based on a second memory operation instruction; and

a Lsync synchronization circuit, operatively coupled to the plurality of Lsync queues, that selectively halts processing of further memory requests from the first Lsync queue based on the first Lsync synchronization marker reaching a predetermined point in the first Lsync queue until the corresponding second Lsync synchronization marker reaches a predetermined point in the second Lsync queue; and

one or more Msync circuits, wherein each of the Msync circuits is connected to the plurality of processors and wherein each of the Msync circuits includes:

a plurality of Msync queues, including a first Msync queue and a second Msync queue, each of the plurality of Msync queues for holding a plurality of pending memory requests received from the Lsync queues, wherein the first Msync queue stores only Msync synchronization markers and memory requests from the first processor, and the

second Msync queue stores only Msync synchronization markers and memory requests from the second processor; and

an Msync synchronization circuit, operatively coupled to the plurality of Msync queues, that selectively halts further processing of the memory requests from the first Msync queue based on an Msync synchronization marker reaching a predetermined point in the first Msync queue until a corresponding Msync synchronization marker from the second processor reaches a predetermined point in the second Msync queue; wherein each of the memory requests is a memory reference, wherein the memory reference is generated as a result of execution of instructions by instruction-processing circuits in each processor.

- (New) The system of claim 24, wherein the Msync synchronization circuit includes a 25. plurality of stall lines, wherein each of the stall lines is connected to one of the plurality of Msync queues and wherein each of the stall lines is for halting further processing of the memory requests from a corresponding Msync queue.
- 26. (New) The system of claim 24, wherein each processor includes a data cache and wherein each Msync synchronization circuit includes an external cache, wherein the data cache and the external cache are used to perform an Lsync V,S type instruction, wherein the Lsync V,S type instruction prevents subsequent scalar references from accessing the data cache until all vector references have been sent to the external cache in a corresponding Msync synchronization circuit and all vector writes have caused any necessary invalidations of the data cache.

#### 27. (New) A method comprising:

providing a plurality of processors, including a first processor and a second processor, wherein each of the processors includes a memory interface, a plurality of Lsync queues connected to the memory interface, including a first Lsync queue and a second Lsync queue, and one or more instruction-processing circuits, each of the instruction-processing circuits operatively coupled through the plurality of Lsync queues to the memory interface;

providing one or more Msync circuits, wherein each of the Msync circuits is connected to the plurality of processors and wherein each of the Msync circuits includes a plurality of Msync queues, including a first Msync queue and a second Msync queue, each of the plurality of Msync queues operatively coupled to the plurality of Lsync queues in one of the plurality of processors;

inserting one or more memory requests into at least one of the Lsync queues based on a first memory operation instruction executed in one of the instruction-processing circuits;

inserting a first Lsync synchronization marker into the first Lsync queue and inserting a second Lsync synchronization marker into the second Lsync queue based on a synchronization operation instruction executed in one of the instruction-processing circuits;

inserting one or more memory requests into at least one of the Lsync queues based on a second memory operation instruction executed in one of the instruction-processing circuits;

processing memory requests from the first Lsync queue;

selectively halting further processing of memory requests from the first Lsync queue based on the first Lsync synchronization marker reaching a predetermined point in the first Lsync queue until the corresponding second Lsync synchronization marker reaches a predetermined point in the second Lsync queue;

inserting Msync synchronization markers and memory requests received from the Lsync queues in the first processor into the first Msync queue;

inserting Msync synchronization markers and memory requests received from the Lsync queues in the second processor into the second Msync queue; and

selectively halting further processing of the memory requests from the first Msync queue based on an Msync synchronization marker reaching a predetermined point in the first Msync queue until a corresponding Msync synchronization marker from the second processor reaches a predetermined point in the second Msync queue;

wherein each of the memory requests is a memory reference.

28. (New) The method of claim 27, wherein selectively halting further processing of the memory requests from the first Msync queue includes sending a stall signal to the Msync queues. Filing Date: August 18, 2003

Title: MULTISTREAM PROCESSING MEMORY-AND BARRIER-SYNCHRONIZATION METHOD AND APPARATUS

29. (New) The method of claim 27, wherein selectively halting further processing of memory requests from the first Lsync queue includes performing an Lsync V,S type instruction, wherein performing the Lsync V,S type instruction includes preventing subsequent scalar references from accessing a data cache in the processor until all vector references have been sent to an external cache in a corresponding Msync synchronization circuit and all vector writes have caused any necessary invalidations of the data cache.